20 research outputs found

    Robust Subgraph Generation Improves Abstract Meaning Representation Parsing

    Full text link
    The Abstract Meaning Representation (AMR) is a representation for open-domain rich semantics, with potential use in fields like event extraction and machine translation. Node generation, typically done using a simple dictionary lookup, is currently an important limiting factor in AMR parsing. We propose a small set of actions that derive AMR subgraphs by transformations on spans of text, which allows for more robust learning of this stage. Our set of construction actions generalize better than the previous approach, and can be learned with a simple classifier. We improve on the previous state-of-the-art result for AMR parsing, boosting end-to-end performance by 3 F1_1 on both the LDC2013E117 and LDC2014T12 datasets.Comment: To appear in ACL 201

    A large annotated corpus for learning natural language inference

    Full text link
    Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.Comment: To appear at EMNLP 2015. The data will be posted shortly before the conference (the week of 14 Sep) at http://nlp.stanford.edu/projects/snli

    The SNLI Corpus

    Get PDF
    The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE). We aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation learning methods, as well as a resource for developing NLP models of any kind.We gratefully acknowledge support from a Google Faculty Research Award, a gift from Bloomberg L.P., the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) contract no. FA8750-13-2-0040, the National Science Foundation under grant no. IIS 1159679, and the Department of the Navy, Office of Naval Research, under grant no. N00014-10-1-0109. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Google, Bloomberg L.P., DARPA, AFRL NSF, ONR, or the US government. We also thank our many excellent Mechanical Turk contributors

    Relation Extraction with Self-determined Graph Convolutional Network

    Full text link
    Relation Extraction is a way of obtaining the semantic relationship between entities in text. The state-of-the-art methods use linguistic tools to build a graph for the text in which the entities appear and then a Graph Convolutional Network (GCN) is employed to encode the pre-built graphs. Although their performance is promising, the reliance on linguistic tools results in a non end-to-end process. In this work, we propose a novel model, the Self-determined Graph Convolutional Network (SGCN), which determines a weighted graph using a self-attention mechanism, rather using any linguistic tool. Then, the self-determined graph is encoded using a GCN. We test our model on the TACRED dataset and achieve the state-of-the-art result. Our experiments show that SGCN outperforms the traditional GCN, which uses dependency parsing tools to build the graph.Comment: CIKM-202

    Language-Independent Discriminative Parsing of Temporal Expressions

    No full text
    Temporal resolution systems are traditionally tuned to a particular language, requiring significant human effort to translate them to new languages. We present a language independent semantic parser for learning the interpretation of temporal phrases given only a corpus of utterances and the times they reference. We make use of a latent parse that encodes a language-flexible representation of time, and extract rich features over both the parse and associated temporal semantics. The parameters of the model are learned using a weakly supervised bootstrapping approach, without the need for manually tuned parameters or any other language expertise. We achieve state-of-the-art accuracy on all languages in the TempEval-2 temporal normalization task, reporting a 4 % improvement in both English and Spanish accuracy, and to our knowledge the first results for four other languages.

    Philosophers are Mortal: Inferring the Truth of Unseen Facts

    No full text
    Large databases of facts are prevalent in many applications. Such databases are accurate, but as they broaden their scope they become increasingly incomplete. In contrast to extending such a database, we present a system to query whether it contains an arbitrary fact. This work can be thought of as re-casting open domain information extraction: rather than growing a database of known facts, we smooth this data into a database in which any possible fact has membership with some confidence. We evaluate our system predicting held out facts, achieving 74.2 % accuracy and outperforming multiple baselines. We also evaluate the system as a commonsense filter for the ReVerb Open IE system, and as a method for answer validation in a Question Answering task

    Combining Distant and Partial Supervision for Relation Extraction

    No full text
    Broad-coverage relation extraction either requires expensive supervised training data, or suffers from drawbacks inherent to distant supervision. We present an ap-proach for providing partial supervision to a distantly supervised relation extrac-tor using a small number of carefully se-lected examples. We compare against es-tablished active learning criteria and pro-pose a novel criterion to sample examples which are both uncertain and representa-tive. In this way, we combine the ben-efits of fine-grained supervision for diffi-cult examples with the coverage of a large distantly supervised corpus. Our approach gives a substantial increase of 3.9 % end-to-end F1 on the 2013 KBP Slot Filling evaluation, yielding a net F1 of 37.7%.

    The SNLI Corpus

    Get PDF
    The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE). We aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation learning methods, as well as a resource for developing NLP models of any kind.We gratefully acknowledge support from a Google Faculty Research Award, a gift from Bloomberg L.P., the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) contract no. FA8750-13-2-0040, the National Science Foundation under grant no. IIS 1159679, and the Department of the Navy, Office of Naval Research, under grant no. N00014-10-1-0109. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Google, Bloomberg L.P., DARPA, AFRL NSF, ONR, or the US government. We also thank our many excellent Mechanical Turk contributors
    corecore